Abstract
Background: Iron deficiency anemia (IDA) affects approximately 1.2 billion people worldwide, yet ferritin-based diagnosis is confounded by inflammation, pregnancy and hemoglobinopathies. During the past decade, artificial intelligence/machine learning (AI/ML) methods have been explored to enhance IDA screening, differential diagnosis, risk prediction and treatment optimization.
Aims: To map and critically appraise the global evidence on AI/ML applications for IDA across all populations, data modalities and care settings.
Methods: We conducted this review in accordance with PRISMA 2020. PubMed (n=229), Embase (n=630), and Scopus (n=241) were searched from March 2015 to March 2025 with controlled vocabulary and free-text terms for “iron deficiency,” “anemia,” “artificial intelligence,” “machine learning,” “deep learning,” and specific algorithm classes (e.g., random forest, convolutional neural network). Only peer-reviewed, English-language studies that developed, validated or updated an AI/ML model to screen, diagnose, predict, or guide management of IDA in humans were eligible; conference abstracts without subsequent full papers were excluded. After duplicates were removed using EndNote (n=199), 901 unique citations were screened; 863 were excluded at title/abstract level. Thirty-eight full texts were assessed; 12 did not meet the inclusion criteria, leaving 26 studies. Risk of bias and applicability were independently assessed with PROBAST (prediction models) or QUADAS AI (diagnostic models); disagreements were resolved by consensus or third-reviewer adjudication. Where multiple metrics were reported, AUROC was prioritized, and when multiple model versions existed, the best performing externally validated model was abstracted. Performance was synthesized using medians and inter-quartile ranges; random-effects meta-analysis of AUROC was not performed as methodological heterogeneity produced I² > 90 % across all subgroups.
Results: Twenty-six primary studies (2015-2025) encompassing 45 688 participants/images/specimens were grouped into four application domains.
• CBC-based differential diagnosis (15/26, 58%): Random-forest, gradient-boosting and SVM models distinguished IDA from β-thalassemia trait with pooled median AUROC 0.92 (IQR 0.88-0.96). The only study with true external validation achieved AUROC 0.94 in 3,211 patients from three independent hospitals.
• Non-invasive image screening (5/26, 19%): Smartphone compatible CNNs assessing palmar, conjunctival or nail-bed photos detected moderate to severe IDA with pooled sensitivity 0.88 and specificity 0.91; the best pediatric model reached AUROC 0.95 on 1,040 external images.
• Prediction of incident/post-operative IDA (3/26, 12%): Linear SVM and XGBoost models integrating baseline laboratories forecasted new-onset IDA 6-12 months after bariatric surgery, chemotherapy or dialysis (AUROC 0.74-0.81; external AUROC 0.80 for the sleeve-gastrectomy cohort).
• Treatment optimization/decision support (3/26, 12%): Reinforcement learning or feed-forward networks cut inappropriate red-cell transfusions by 9-17% and suggested personalized iron dosing while maintaining hemoglobin stability.
Across all studies, the median analytic sample size was 629 (IQR 312-1 863). Only 4/26 papers (15 %) reported any form of external or temporal validation; the remainder relied solely on internal testing. Fifty-four percent were judged as high risk of bias, chiefly because of retrospective single-center designs and inadequate handling of missing data. Fairness analyses were reported in just one study, and none included cost effectiveness or clinical impact assessment.
Conclusions: AI/ML tools can match or surpass traditional indices for laboratory differentiation of IDA and enable promising point-of-care smartphone screening. However, implementation is limited by scarce external validation, few impact studies, and minimal attention to fairness, calibration drift or health-economic value. Future research should prioritize multi-site prospective trials, external benchmarking datasets, adherence to TRIPOD AI reporting, and integration of equity and cost-effectiveness analyses, especially in low-resource settings where IDA burden is greatest.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal